Extending the STTS for the annotation of spoken language

نویسندگان

  • Ines Rehbein
  • Sören Schalowski
چکیده

This paper presents an extension to the Stuttgart-Tübingen TagSet, the standard part-of-speech tag set for German, for the annotation of spoken language. The additional tags deal with hesitations, backchannel signals, interruptions, onomatopoeia and uninterpretable material. They allow one to capture phenomena specific to spoken language while, at the same time, preserving inter-operability with already existing corpora of written language.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The 8 th Linguistic Annotation Workshop in conjunction with COLING 2014

Part-of-speech tagging (POS-tagging) of spoken data requires different means of annotation than POS-tagging of written and edited texts. In order to capture the features of German spoken language, a distinct tagset is needed to respond to the kinds of elements which only occur in speech. In order to create such a coherent tagset the most prominent phenomena of spoken language need to be analyze...

متن کامل

STTS 2.0? Improving the Tagset for the Part-of-Speech-Tagging of German Spoken Data

Part-of-speech tagging (POS-tagging) of spoken data requires different means of annotation than POS-tagging of written and edited texts. In order to capture the features of German spoken language, a distinct tagset is needed to respond to the kinds of elements which only occur in speech. In order to create such a coherent tagset the most prominent phenomena of spoken language need to be analyze...

متن کامل

STTS goes Kiez - Experiments on Annotating and Tagging Urban Youth Language

The Stuttgart-Tübingen Tag Set (STTS) (Schiller et al., 1995) has long been established as a quasi-standard for part-of-speech (POS) tagging of German. It has been used, with minor modifications, for the annotation of three German newspaper treebanks, the NEGRA treebank (Skut et al., 1997), the TiGer treebank (Brants et al., 2002) and the TüBa-D/Z (Telljohann et al., 2004). One major drawback, ...

متن کامل

FOLK-Gold ― A Gold Standard for Part-of-Speech-Tagging of Spoken German

In this paper, we present a GOLD standard of part-of-speech tagged transcripts of spoken German. The GOLD standard data consists of four annotation layers – transcription (modified orthography), normalization (standard orthography), lemmatization and POS tags – all of which have undergone careful manual quality control. It comes with guidelines for the manual POS annotation of transcripts of Ge...

متن کامل

The Impact of Language Learning Activities on the Spoken Language Development of 5-6-Year-Old Children in Private Preschool Centers of Langroud

The Impact of Language Learning Activities on the Spoken Language Development of 5-6-Year-Old Children in Private Preschool Centers of Langroud N. Bagheri, M.A. E. Abbasi, Ph.D. M. GeramiPour, Ph.D. The present study was conducted to investigate the impact of language learning activities on development of spoken language in 5-6-year-old children at private preschool center...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012